feat(runtime): add `RuntimeTracer` trait for task instrumentation by xanderbailey · Pull Request #2467 · apache/iceberg-rust

xanderbailey · 2026-05-19T12:01:27Z

Allows callers to inject observability (tracing spans, metrics, etc.) into spawned tasks without modifying spawn sites. Inspired by DataFusion's JoinSetTracer but scoped per-Runtime rather than global, enabling different tracers for IO vs CPU handles.

Why `RuntimeTracer` instead of tokio's native tracing?

iceberg-rust is a library, it shouldn't dictate the observability stack. RuntimeTracer provides a spawn-site hook that lets consumers inject their own instrumentation without coupling the library to any specific tracing crate.

Concern	tokio tracing	`RuntimeTracer`
Dependency	Requires `tracing` crate in the library	Zero added dependencies
Stack choice	Tied to `tracing` ecosystem (subscribers, layers)	Stack-neutral — implement with `tracing`, OTel, Prometheus, `log`, or anything else
Split runtimes	No concept of IO vs CPU categorization	Hooks are attached per-`RuntimeHandle`, so IO and CPU work are distinguishable
Control	Library authors decide what to instrument	Library consumers decide what to instrument

The two approaches are complementary: a consumer could implement RuntimeTracer to attach a tracing::Span, getting tokio-console visibility and library-level spawn-site context without iceberg taking an opinion.

Which issue does this PR close?

Closes Add tracing to the new runtime #2468

What changes are included in this PR?

Are these changes tested?

Allows callers to inject observability (tracing spans, metrics, etc.) into spawned tasks without modifying spawn sites. Inspired by DataFusion's JoinSetTracer but scoped per-Runtime rather than global, enabling different tracers for IO vs CPU handles.

xanderbailey · 2026-05-19T12:03:17Z

@CTTY would be interested on your thoughts here?

CTTY · 2026-05-21T21:29:50Z

Hi Xander,

Thanks for having this!
I also thought about this before and was thinking that the tracing part could be deferred to the custom implementation once we made Runtime itself a trait. But I don't have a strong opinion here.

Curious to hear what others think as well

xanderbailey · 2026-05-21T21:52:01Z

Makes sense, I thought about this too and this trait is designed to be agnostic of the concrete runtime (tokio) in the hope that it wouldn’t need to be changed. It just generally wraps futures.

Is the worry that we don’t know if that’s actually true until we design a runtime trait?

Kurtiscwright · 2026-05-28T22:38:24Z

+        self.io.tracer = Some(tracer.clone());
+        self.cpu.tracer = Some(tracer);


Why is the io.tracer cloned, but the cpu tracer isn't?

You need it twice so the first is a "move" but clone of Arc is very cheap

Kurtiscwright · 2026-05-28T22:46:33Z

+    /// The implementation must not alter the closure's return value.
+    fn trace_block(
+        &self,
+        f: Box<dyn FnOnce() -> Box<dyn Any + Send> + Send>,


Because this is blocking do we really need the Send trait?

Tokio spawn_blocking requires Send because you pass these to thread pool so you do cross a thread boundary. https://docs.rs/tokio/latest/tokio/task/fn.spawn_blocking.html

Kurtiscwright · 2026-05-28T22:47:35Z

+) -> impl FnOnce() -> T + Send + 'static
+where
+    F: FnOnce() -> T + Send + 'static,
+    T: Send + 'static,


Same as above why do we need Send when this is blocking?

blackmwk · 2026-05-29T10:13:08Z

Thanks @xanderbailey for this pr, but I'm not convinced that this is the right direction to go. The motivation you mentioned have two parts:

Not relying on tracing for observability. This is only a benefit in theory, in practice what user are mostly concerned about are collecting the spans with different sink rather than the library used to generate the spans. tracing is widely used and has a large ecosystem, what's more we are already relying on tracing for logging and method tracking.
Library consumers decide what to instrument. This is a good point, but I don't know what kinds of information library consumer could get from a BoxFuture. In fact, tokio runtime has provided some instrument point: see on_after_task_poll, on_before_task_poll, etc. I think the context provided by tokio's instrument point is more useful than a BoxFuture, and that's the right direction to go since iceberg crate's Runtime is built one tokio Runtime, which is created and maintained by user.

xanderbailey · 2026-05-29T11:30:03Z

The strongest example is span propagation across spawn boundaries. When you tokio::spawn, the tracing span context is severed, the spawned task doesn't inherit the parent span. So if you're tracing a table scan that spawns 50 IO tasks to fetch manifests and data files, those tasks appear as orphans in your Jaeger/Tempo traces rather than children of the scan.

Very similar conversation was had when this was introduced in datafusion apache/datafusion#14547

xanderbailey added 2 commits May 19, 2026 13:32

remove box

2802e33

docs

b56a343

Kurtiscwright reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runtime): add `RuntimeTracer` trait for task instrumentation#2467

feat(runtime): add `RuntimeTracer` trait for task instrumentation#2467
xanderbailey wants to merge 3 commits into
apache:mainfrom
xanderbailey:feat/runtime-tracer

xanderbailey commented May 19, 2026 •

edited

Loading

Uh oh!

xanderbailey commented May 19, 2026

Uh oh!

CTTY commented May 21, 2026

Uh oh!

xanderbailey commented May 21, 2026

Uh oh!

Kurtiscwright May 28, 2026

Uh oh!

xanderbailey May 29, 2026

Uh oh!

Kurtiscwright May 28, 2026

Uh oh!

xanderbailey May 29, 2026

Uh oh!

Kurtiscwright May 28, 2026

Uh oh!

blackmwk commented May 29, 2026

Uh oh!

xanderbailey commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		self.io.tracer = Some(tracer.clone());
		self.cpu.tracer = Some(tracer);

Conversation

xanderbailey commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why RuntimeTracer instead of tokio's native tracing?

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

xanderbailey commented May 19, 2026

Uh oh!

CTTY commented May 21, 2026

Uh oh!

xanderbailey commented May 21, 2026

Uh oh!

Kurtiscwright May 28, 2026

Choose a reason for hiding this comment

Uh oh!

xanderbailey May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Kurtiscwright May 28, 2026

Choose a reason for hiding this comment

Uh oh!

xanderbailey May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Kurtiscwright May 28, 2026

Choose a reason for hiding this comment

Uh oh!

blackmwk commented May 29, 2026

Uh oh!

xanderbailey commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xanderbailey commented May 19, 2026 •

edited

Loading

Why `RuntimeTracer` instead of tokio's native tracing?

xanderbailey commented May 29, 2026 •

edited

Loading